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Optimal  Path  Planning  of  a  Target  Following  Fixed-Wing  UAV  Using 

Sequential  Decision  Processes* 

Stanley  S.  Baek*.  Hyukseong  Kwon*,  Josiah  A.  Yoder*,  and  Daniel  Pack* 


Abstract*—*  In  this  work*  we  consider  the  optimal  path  of  a 
fixed-wing  unmanned  aerial  vehicle  (UAV)  tracking  a  mobile 
surface  target-  One  of  the  limitations  of  fixed-wing  UAVs  in 
tracking  mobile  targets  is  the  lack  of  hovering  capability  when 
the  target  moves  much  slower  than  the  minimum  UAV  speed, 
requiring  the  UAV  maintain  an  orbit  about  the  target.  In  this 
paper,  we  propose  a  method  to  find  the  optimal  policy  for  fixed- 
wing  UAVs  to  minimi/e  the  location  uncertainty  of  a  mobile 
target.  Using  a  grid -based  Markov  Decision  Process  (MDP),  we 
process  a  policy  iteration  algorithm  offline  to  find  the  optimal 
UAV  path  in  a  discretized  state  space.  Based  on  the  offline 
optimal  policy,  we  generate  a  finer  grid  MDP  for  the  region  of 
interest  to  efficiently  process  an  online  policy  iteration  to  find 
in  real-time  the  optimal  trajectory  for  a  UAV-  We  validate  the 
proposed  algorithm  using  computer  simulations.  Comparing 
the  simulation  results  with  other  methods,  we  show  that  the 
proposed  method  has  up  to  13%  decrease  in  error  uncertainty 
than  ones  resulted  using  conventional  methods- 

I-  Introduction 

Over  the  last  few  decades,  engineers  and  researchers 
have  made  remarkable  progress  toward  the  development  of 
unmanned  aerial  vehicles  (UAVs)  in  both  civil  and  military 
applications  including  intelligence  collection,  surveillance, 
reconnaissance,  and  environmental  monitoring,  to  name  a 
few.  For  such  applications,  ground  or  water  surface  target 
tracking  is  a  fundamental  and  challenging  task.  Since  the 
sensing  capabilities  of  UAVs  are  constrained  by  the  UAV 
dynamics  and  the  onboard  sensor  capabilities,  intelligent  path 
planning  for  UAV s  to  maximize  target  localization  accuracy 
is  an  important  area  of  research. 

Unlike  rotary-wing  UAVs  capable  of  hovering  over  a 
mobile  surface  target,  fixed-wing  UAVs  must  must  maintain 
some  minimum  speed  to  produce  enough  lift  force  and  plan 
ahead  when  tracking  targets  that  move  slower  than  UAVs, 
The  optimal  path  of  a  fixed-wing  UAV  following  a  target 
would  be  one  of  the  paths  shown  in  Fig.  1,  depending  on 
target  speeds.  If  the  target  speed,  Vti  is  greater  than  the 
minimum  UAV  speed,  V^in,  and  less  than  the  maximum 
UAV  speed,  the  simplest  way  to  follow  the  target  is 

to  match  the  UAV  speed  with  the  target  speed  as  shown  in 
Fig.  la.  For  <  V^in,  the  UAV  can  no  longer  maintain  a 
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Fig,  I:  The  paths  of  a  fixed-wing  UAV  following  a  target 
traveling  at  the  speed  of  (a)  V^m  <  V1  <  (b)  V 1  < 

V^in ,  (c)  V1  «  V£ln  (d)  Vt  —  0,  where  Vi  is  the  target 
speed  and  Vu  is  the  UAV  speed- 


straight  path  to  follow  the  target.  Instead,  the  UAV  path  must 
be  a  sinusoidal  path  as  shown  in  Fig,  lb.  As  V1  becomes 
much  less  than  V^in>  a  suitable  path  will  be  an  orbital  motion 
about  the  target.  As  an  extreme  case  where  the  target  is  not 
moving,  the  optimal  path  will  be  a  circular  path  about  the 
target.  In  this  paper,  we  propose  a  method  to  find  in  real-time 
the  optimal  path  of  a  fixed-wing  UAV  following  a  target  at 
various  velocities,  and  we  demonstrate  the  optimal  path  is 
indeed  very  close  to  one  of  the  aforementioned  paths- 
A  number  of  engineers  have  introduced  a  variety  of 
path  planning  algorithms  for  target  tracking.  Beard  et  aL 
proposed  a  system  architecture  for  assigning  and  intercepting 
targets  using  cooperative  UAV  systems  [3].  In  their  method, 
coordination  and  negotiation  between  UAVs  were  made. 


considering  estimated  time  for  a  UAV  to  reach  a  target 
(arrival  time  at  target)  and  available  threats  from  scattered 
obstacles,  Tisdale  et  ah  repotted  practical  UAV  control  strate¬ 
gies  for  fixed-wing  aircraft  using  adaptable  receding-horizon 
control  [11].  In  their  paper,  an  increasing  horizon  planner 
was  proposed  to  find  appropriate  optimization  strategies  for 
different  situations.  He  et  at.  suggested  a  quardrotor  path 
planning  approach  based  on  a  belief  network  for  tracking 
ground  targets  in  an  urban-like  (road  networked)  environ¬ 
ment  [6],  The  belief  of  each  targefs  pose  was  modeled  with 
a  multi-modal  Gaussian  distribution* 

Compared  to  estimating  al!  feasible  and  continuous  UAV 
paths  to  acquire  the  optimal  UAV  path,  it  is  often  more 
efficient  to  discretize  a  continuous  state  space  into  a  state 
space  with  a  fixed  number  of  discrete  states.  Additionally, 
the  nonholonomic  motion  of  fixed-wing  UAVs  reduces  the 
number  of  available  choices  for  the  next  UAV  states.  Lav  is 
et  ah  proposed  a  search  and  tracking  approach  for  moving 
targets  by  dynamically  changing  the  size  of  the  search  space 
to  contain  only  the  high-probability  target  locations.  This 
improves  the  computational  efficiency  and  allows  targets 
to  be  tracked  beyond  the  original  search  area  [7].  Their 
approach  used  recursive  Bayesian  estimation,  maintaining 
two  different  probabilistic  density  functions  for  the  update 
and  prediction  stages.  Another  method  developed  by  Yu  et 
ah  suggests  a  cooperative  target  tracking  approach  using  dy¬ 
namic  occupancy  grids  of  the  estimated  target  locations  [13]. 
To  model  occupancy  grids  for  probabilistic  path  planning, 
they  used  a  Bayesian  filter  and  a  second-order  Markov  chain. 

Among  discretized  approaches,  the  Markov  Decision  Pro¬ 
cess  (MDP)  [4]  is  a  powerful  method  for  its  optimality. 
Akselrod  et  ah  applied  a  decentralized  MDP  approach  to 
solve  a  multiple  target  tracking  assignment  problem  with 
multiple  UAVs  [1].  In  their  modified  decentralized  MDP 
method,  the  transitions  and  observations  of  each  agent  are 
considered  independent  to  obtain  the  polynomial  algorithm. 
Yeow  et  ah  proposed  a  hierarchical  MDP  for  multiple  target 
tracking  in  wireless  sensor  networks  [12].  By  using  the 
clustered  sensor  network,  an  energy  efficient  and  distributed 
multiple  target  tracking  network  can  be  constructed.  Al- 
Sabban  et  al.  [2]  demonstrated  a  path  planning  algorithm 
for  a  UAV  using  MDP  to  minimize  energy  consumption  by 
exploiting  wind-energy  distributed  in  grid  cells. 

In  this  paper,  we  propose  an  optimal  UAV  path  planning 
technique  while  maintaining  appropriate  target  detectability 
with  sequential  decision  processes.  In  order  to  acquire  the 
optimal  path,  an  MDP  with  grid-based  discretization  is 
applied  to  estimate  costs  for  each  UAV  path  decision.  The 
main  contributions  of  this  work  are  i)  a  computationally 
efficient  state  space  reduction  technique,  2)  an  MDP  design 
for  the  optimal  path  planning  of  a  fixed-wing  UAV  tracking 
a  mobile  target,  and  3)  a  real-time  algorithm  to  compute  the 
optimal  policy. 
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Fig.  2:  A  model  of  a  UAV  following  a  surface  target  in  (a) 
the  world  coordinates  and  (b)  the  target-fixed  coordinates 


and  heading.  However,  because  we  continuously  refine  our 
estimates  of  the  target’s  velocity,  our  approach  allows  us 
to  track  any  moving  target.  Although  it  is  simple,  the 
“almost  constant  velocity”  method  has  been  shown  to  be 
quite  effective  in  many  applications,  and  our  experiments 
demonstrate  that  the  approach  works  well  at  tracking  targets 
which  change  their  speed  and  heading. 

The  coordinate  system  that  we  use  in  our  planning  al¬ 
gorithms  is  fixed  to  a  moving  target.  To  build  an  intuitive 
understanding  of  how  our  approach  can  handle  uncertainty  in 
both  target  and  UAV  motion,  in  this  section,  we  briefly  derive 
the  uncertainty  of  a  UAV  in  the  coordinate  frame  of  a  moving 
target  based  upon  the  uncertainties  in  a  fixed  reference  frame. 

Suppose  in  a  world-fixed  Cartesian  reference  frame 
(*»!&»),  shown  in  Fig.  2a,  the  target  is  located  at  x^r  t 
R2  and  moving  with  velocity  £  R2.1  Suppose  further 
that  we  are  able  to  produce  estimates  of  the  UAV  location 
and  velocity,  respectively,  x™t  ^  V(x£,P">jk)  and 
Af  (v£ ,  iV).  Similarly,  we  have  estimates  of  the  targets  lo¬ 
cation  ~  PltX)  and  velocity  v*,  -  P^v). 

We  now  define  a  second  coordinate  system  (x,  y)t  whose 
origin  is  fixed  to  the  target  shown  in  Fig.  2b.  Since  we 
will  use  this  coordinate  system  throughout  the  remainder 
of  the  paper,  we  use  no  prefix  for  simplicity.  We  set  the 


IL  Dynamic  Model 

The  path  planning  approach  used  in  this  paper  is  built 
on  the  assumption  of  a  target  traveling  at  a  constant  speed 


’We  assume  the  motion  of  the  UAV  is  constrained  in  2D  space,  he,,  the 
altitude  of  the  UAV  is  constant.  This  assumption  is  valid  since  we  plan  to 
extend  our  method  to  multiple  UAVs  tracking  multiple  targets,  and  we  plan 
to  have  each  UAV  maintain  its  altitude  to  avoid  collision. 


(»J+1)S  v 
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8*  is  defined  by 

P{8*  =  8  +  a\$ya)  =  p 

p{e'  =  ^  +  o  +  — |S(o)  = 

m'  '  2 

P(9'  =  6  +  a-—\s,a)  = 

m  w  2 

where  pe  [0,  lj  is  the  probability  of  the  next  heading 
angle  without  disturbances.  In  other  words,  the  resulting 
orientation  of  the  UAV  will  be  the  same  as  the  intended 
orientation  with  probability  of  p.  Since  we  consider  the 
target  is  moving  at  a  constant  speed  for  each  iteration 
of  MDP,  the  state  transition  of  target  speed  is 


Fig,  4:  State  transition  of  a  discrete  state  space  MDP  in  a 
grid.  The  probability  distribution  of  the  discretized  states  at 
k  +  1  is  inversely  proportional  to  the  distance  from  vertices 
and  the  expected  location  of  the  UAV  at  HI 


^t+i  —  @l  +  Uf  4-  w$ 

#t+i  =  Xt  +  cos$>  —  &Tvt  4-  wx 
?/t+ 1  -  Vt  4-  A  Tut  sin  4>  +  wy 
%+ 1  =  vt  4-  wv 

where  ut  is  the  speed  of  the  UAV  at  time  t,  vt  is  the 
speed  of  the  target,  at  6  A  is  the  input  steering  angle, 
At  is  the  time  duration  for  a  single  time  step,  0  = 
8t+at  is  the  expected  heading  angle  at  f+1,  and  w'$  are 
the  zero  mean  Gaussian  random  noises.  For  the  discrete 
state  transition,  we  first  take  the  expectation  of  the  next 
continuous  state  and  find  the  probability  distribution 
based  on  the  distance  between  the  nearest  four  vertices 
and  the  expected  location  of  the  UAV  as  shown  in  Fig,  4. 
The  probability  of  the  next  grid  location,  qf  £  Z2,  is 
given  by  the  inverse  of  the  distance  between  q*  and  the 
expected  location  of  the  UAV  divided  by  the  sum  of  the 
inverse  of  distances  between  the  nearest  four  vertices 
and  the  estimated  UAV  location,  Le., 


P(vic+ 1  =Vk\s,a)  =  1  (3) 

*  7Z  :  S  — *  JR  is  the  expected  reward  for  each  state 
transition.  In  general,  the  reward  of  an  MDP  is  a 
function  of  the  current  state,  the  action  applied  to  the 
current  state,  and  the  next  state.  For  this  work,  however, 
the  reward  is  just  a  function  of  the  current  state,  Le.,  the 
proximity  of  the  UAV  to  the  target,  (xkj  yk).  defined  by 

*•(«)  =  Ca  -  (xt  +  yl)  i ,  (4) 

where  Cmax  is  the  maximum  range  of  the  gimbaled 
camera  onboard  the  UAV,  Equation  (4)  implies  a  neg¬ 
ative  reward  for  the  target  not  in  the  UAV*s  sensor 
range.  The  maximum  distance  to  the  target,  Cma* 
is  determined  by  the  camera  specification.  Since  we 
assume  that  we  use  a  gimbaled  camera,  we  can  change 
the  orientation  from  the  sensor  to  the  target  so  that  the 
camera  sees  the  target  at  all  times.  However,  allowable 
field  of  view  (FOV)  and  pixel  size  of  the  target  to 
recognize  the  ground  target  should  be  an  important 
factor  to  determine  the  value  of  C7max,  Let  us  assume 
the  allowable  FOV  angle  to  be  a/,  the  resolution  (width 
or  height)  of  image  plane  as  P>  the  expected  target  size 
on  the  ground  to  be  tacu  and  the  minimum  size  of  the 
target  in  the  image  plane  as  tpix,  then  we  can  have  the 
following  relation: 

Pixel  Size  Target  Size 


m. 


Li+m,j+n 


iCiij)  — 


^a+m,j+n 


£***0.1} 


(i) 


for  m,  n  £  {0,1},  where  dij  is  the  distance3  between 
the  expected  UAV  location,  (E[a?*+i]f  E[jfe+i])  ,  and  the 
vertex  at  (i,j)  and 


Image  Resolution  Ground  Coverage 

If  we  express  the  above  relation  approximately  with 
variables, 

2tan{Q//2)  tact 

~  -  — -  -  —  (6) 

1  lpix 


Cij  =  (LE[a:t+i]/^J,  LE^+il/^J)  (2)  Then  Cmax  is  estimated  as 


where  gw  and  gh  are,  respectively,  the  grid  cell  width 
and  height  and  |_cj  is  the  largest  integer  not  greater  than 
a  The  probability  distribution  of  the  next  heading  angle 


Cmax  ■ — 


P _  tact 

2tan(a//2)  tm 


B.  Policy  Iteration 


(7) 


JTo  avoid  singularity  when  the  distance  is  zero,  we  added  a  small  value 
e  fti  0.0001  to  the  distance. 


In  MDP,  the  value  of  a  state  $  £  S  under  a  policy  tt(s)  <e 
Af  denoted  by  V^s),  is  the  expected  return  when  starting 


Fig.  5:  Fine-grained  grid  for  an  MDP  covering  the  area 
(depicted  in  gray)  near  the  optimal  path  obtained  from  the 
lookup  table  generated  by  the  policy  iteration  algorithm  using 
a  coarse-grained  grid. 


in  s  and  following  i r  thereafter,  as  defined  in  [10] 

[rt+i  +  7rt+2  +  72n+3  +  --|s(  =  s] 

=  [rt+i  +  yrV* (st+\  |st  =  s] 

=  E  F*(s’ E  T(s> a ’  s0  [*•($)  +  7^*(*'](38) 

a  $' 

where  sf  is  the  next  state  followed  by  the  current  state  s  when 
the  action  a  is  applied,  P7r(s^a)  is  the  probability  of  taking 
action  a  in  state  s  under  policy  tt,  and  the  discount  factor 
0  <  7  <  I  modulates  the  effect  of  future  rewards  on  present 
decisions.  Equation  (8)  is  called  the  state- value  function  for 
policy  7T,  and  this  policy  can  be  evaluated  iteratively  by 

VR-1 W  <-  E  T(s’  »<*>. s')  [*■(«)  +  tW)].  (9) 

s' 

where  Vk  is  the  state-value  function  at  the  k- th  iteration.  Each 
policy  evaluation  defined  by  (9)  is  started  with  the  value 
function  for  the  previous  policy.  Using  this  updated  state- 
value  function  14,  we  obtain  a  better  policy  using  policy 
improvement  given  by 

n'(s)  =  arg  max  Y'  T(s,  a,  s')  [r(s)  +  yV£  (s')] ,  (10) 

a 

s' 

where  i /(s)  is  the  new  greedy  policy.  Starting  from  Vo(^)  = 
0  and  7r(s)  arbitrarily  Vs  €  S ,  we  repeat  (9)  followed  by 
(10)  until  tt(s)  converges  to  the  optimal  policy  n*  [5].  The 
pseudocode  for  the  policy  iteration  algorithm  can  he  found 
in  [9],  [10]. 

Using  die  policy  iteration,  we  find  the  optimal  policy  for 
all  discrete  states.  One  of  the  problems  in  this  algorithm, 
however,  is  that  It  is  computationally  very  expensive.  As  a 
consequence,  it  is  impractical  to  be  used  in  real  time  by  a 
small  UAV,  The  method  we  developed,  however,  uses  the 
algorithm  offline  for  a  number  of  fixed  target  velocities  and 
uses  the  computed  optimal  policy  as  a  lookup  table  in  real¬ 
time.  This  lookup  table  method  works  well  for  a  constant 
target  speed,  but  sometimes  it  is  susceptible  to  vaiying  target 
speed  with  observation  noise. 


TABLE  I:  Specifications  of  offline  and  online  policy  itera¬ 
tions 


Offline 

Online 

Grid  size 

700  m  x  700  m 

Area  of  interest 

Grid  cell  size 

12  m  x  12  m 

3  m  x  3  m 

Max  Sensor  Range 

300  m 

300  m 

#  of  orientation  angles 

64 

64 

#  of  possible  input  angles 

5 

5 

Minimum  turning  radius 

60  m 

60  m 

Target  speeds 

{0, 1,  *.*,  12}  m/s 

[0,  Vmi*] 

discount  (7) 

0.95 

035 

In  addition  to  the  lookup  table  method,  we  have  developed 
an  algorithm  that  improves  the  lookup  table  method*  For 
a  given  state,  we  create  a  fine-grained  grid  for  an  MDP 
covering  the  area  near  the  optimal  path  obtained  from  the 
lookup  table  as  shown  in  Fig,  5.  The  number  of  states 
required  is  now  much  less  than  the  offline  MDP  because  we 
only  need  to  cover  a  small  region  (an  example  is  depicted  in 
gray  in  Fig.  5),  For  this  new  MDP,  we  set  a  large  reward  for 
the  goal  states  (depicted  in  dark  gray  in  Fig,  5)  and  process 
online  policy  iteration*  Since  the  initial  policy  is  already  near 
optimal,  the  policy  jr  converges  to  the  optimal  policy  tt* 
much  faster  than  the  offline  algorithm* 

IV.  Results  and  Analysis 

We  processed  the  offline  policy  iteration  for  target  speeds 
of  0,  1,  ***,  12  mis  using  the  specifications  listed  on  Table  L 
For  each  target  speed,  the  process  took  approximately  40 
hours.4  Figure  6  shows  UAV  paths  following  targets  with 
constant  speeds.  For  a  target  moving  much  slower  than  the 
UAV  (the  speed  ratio  of  the  UAV  to  the  target  a  is  less 
than  approximately  0.25),  the  optimal  path  of  the  UAV  is 
a  spiral  motion  as  shown  in  Fig.  6a.  As  an  extreme  case, 
the  UAV  path  becomes  a  circular  orbit  about  the  target  as 
the  target  speed  approaches  zero*  On  the  other  hand,  the 
optimal  UAV  path  following  a  target  slightly  slower  than  the 
UAV  (a  >~  0,5)  is  a  sinusoidal  path  as  shown  in  Fig,  6c, 
Fig  6b  shows  the  optimal  path  for  the  UAV  following  the 
target  when  ^  0.25  <  a  0.5.  As  a  rather  simple  case, 
if  a  target  moves  at  a  speed  greater  than  the  minimum  UAV 
speed  but  less  than  the  maximum  UAV  speed,  the  optimal 
path  would  be  linear.  As  we  expected,  these  results  agree 
with  the  predicted  paths  shown  in  Fig*  1. 

For  the  online  simulations  as  specified  on  Table  I,  we  first 
estimated  the  target  velocity  and  selected  the  optimal  policy 
and  path  from  the  offline  results  based  on  the  rounded  target 
speed.  Then,  we  generated  a  new  MDP  for  the  estimated 
target  speed  with  the  cell  size  of  3  m  x  5  m.  Although  we 
had  a  finer  granularity,  the  total  number  of  states  was  orders 
of  magnitude  smaller  than  the  offline  MDP*  The  average  time 
of  computation  was  approximately  !  second  for  each  time 
step. 

We  have  generated  a  number  of  target  paths  with  their 
target  velocities  varying  along  the  paths*  The  UAV  with 

4We  used  Matfab  on  an  Intel  L7  microprocessor  to  nm  the  algorithm. 


(a) 


Fig,  6;  The  optimal  paths  of  12  m/s  UAV  tracking  a  target 
moving  at  the  speed  of  (a)  1,3  m/s,  (b)  4,0  m/s,  and  (c)  1  LG 
m/s. 


online  policy  iteration  algorithm  follows  the  target  based  on 
the  noisy  observations  of  the  target  locations  with  the  noise 
variance  of  5  meters  in  both  £  and  y  directions, 

A  cycloid  path  of  a  target  shown  in  Fig.  7  has  target 
velocities  from  0  to  1 1.0  m/s  and  it  changes  its  direction  by 
1 80°  once  each  period.  At  the  beginning  of  target  tracking, 
the  UAV  tries  to  make  a  spiral  path  for  the  slow  motion  of  the 
target.  As  the  UAV  finds  the  target  speed  increasing,  the  UAV 
path  becomes  oscillatory  tin  til  the  target  speed  reaches  the 
minimum  UAV  speed.  As  the  target  slows  down  its  speed  and 


Fig.  7:  UAV  following  a  target  on  a  cycloid  path.  The  target 
velocity  varies  from  0  to  II  m/s.  The  sampling  time  of  the 
path  is  1  second,  but  the  intervals  between  dots  are  5  seconds 
for  high  visibility. 


Fig.  8:  UAV  following  a  target  on  a  cycloid  path.  The  target 
velocity  varies  from  IJ  to  13.0  m/s. 


changes  its  heading  direction  at  the  end  of  one  full  cycle  of 
the  cycloid  path,  the  UAV  makes  a  couple  of  spiral  motions 
until  it  makes  another  oscillatory  path  to  follow  the  target 
The  average  root  mean  square  error  (RMSE)  of  distance 
between  the  UAV  and  the  target  is  87.68  meters. 

An  arrowhead  path  shown  in  Fig.  8  is  also  interesting.  The 
path  has  straight  lines,  sharp  turns,  and  a  slight  turn  with  the 
velocity  varying  from  1.1  to  13,0  m/s.  The  target  starts  from 
the  origin  and  moves  toward  the  positive  x  direction  at  the 
beginning.  As  the  target  rapidly  slows  down  to  make  a  sharp 
turn,  the  UAV  happens  to  fly  away  from  the  target,  but  it  soon 
catches  up  with  the  target.  The  average  RMSE  between  the 
UAV  and  the  target  is  72. 15  meters. 

The  zigzag  path  shown  in  Fig.  9  is  not  a  realistic  path 
since  there  are  90°  turns  with  constant  velocity.  The  zigzag 
path,  however,  is  a  good  example  to  test  overshoot  behaviors. 
The  average  RMSE  between  the  UAV  and  the  target  is  64.77 
meters  for  the  zigzag  path. 

We  compared  the  results  with  those  generated  using  other 
methods.  Rafi  et  al.  [8]  proposed  a  simple  technique  for 
short-term  path  planning  to  minimize  the  long-term  distance 
from  the  target.  Rather  than  approaching  the  target  directly, 


Fig,  9:  UAV  following  a  target  on  a  zigzag  path  with  90° 
turns.  The  target  velocity  is  10,5  m/s. 


TABLE  II:  Root  Mean  Square  Errors  (meters) 


Lookup  tabled 

Online* 

Rafi  c 

Heuristic 

Cycloid 

103.23 

87.68 

1 10,89 

100.95 

Arrowhead 

111,21 

72,15 

76.38 

77.13 

Zigzag 

94,75 

64,77 

82.44 

107,69 

Computational  Time** 

fj,$ 

M  Sec 

^35 

"'--35  jis 

"Generated  by  the  offline  policy  iteration 
b Online  policy  iteration  based  on  the  policy  from  the  offline 
policy  iteration, 

^Rafi  ei  aL  [SI 

d Using  Matlab  on  an  Intel  i7  microprocessor 


UAVs  approach  a  tangent  line  to  a  minimum-radius  orbit 
around  the  predicted  target  location.  When  already  inside 
the  orbit*  they  fly  straight  ahead. 

We  also  developed  a  heuristic  method  that  a  UAV  tries  to 
keep  the  distance  of  the  minimum  orbital  radius  from  the 
target.  We  also  assumed  that  the  UAV  had  five  choices  of 
yaw  angle  deviation  at  each  time  as  discussed  before.  Among 
the  UAV  motion  choices,  the  UAV  selected  the  motion  that 
made  the  distance  between  the  estimated  UAV  location  and 
the  predicted  target  location  closest  to  the  desired  (minimum) 
orbital  radius.  The  predicted  target  location  at  time  t+1  were 
determined  by  x£+1  =  xjj.  +  At{x£  -  xj^J. 

Fig,  10  shows  the  simulation  results  of  the  Rafi  and  the 
heuristic  methods.  The  results  in  Table  II  show  that  our 
optimal  solution  is  superior  to  the  two  methods.  For  the 
cycloid  path,  our  approach  has  performed  with  approximately 
13%  decrease  in  error.  Although  the  computational  time5  of 
offline  policy  iteration  lookup  table  method  is  small,  that  of 
online  policy  iteration  method  is  much  greater  than  the  other 
methods.  We  are  currently  working  toward  both  reducing 
RMSE  for  offline  policy  iteration  method  and  decreasing  the 
computational  time  for  online  policy  iteration  method. 


5 Since  it  is  difficult  to  compare  the  computational  costs  with  big  O 
notations,  we  provide  with  the  computational  time  for  reader  to  get  a  sense 
of  computational  costs. 
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Fig.  10:  The  paths  generated  by  the  Raff  s  method  and  the 
heuristic  method  for  (a)  a  cycloid  path  and  (b)  an  arrowhead 
path  (c)  a  zigzag  path. 


V.  Conclusion  and  Future  Work 

Ip  this  paper,  we  have  shown  a  new  optimal  UAV  path 
planning  method  to  track  a  mobile  target.  Our  method  uses 
the  theory  of  Markov  Decision  Processes  to  select  the  opti¬ 
mal  sequence  of  UAV  waypoints  to  follow  the  surface  target 
while  minimizing  target  localization  uncertainties.  Through 
policy  iterations,  we  compute  the  globally  optimal  policy 
offline,  and  using  an  online  policy  iteration  algorithm,  we 
present  a  real-time  efficient  solution  to  find  the  optimal  path. 
The  simulation  results  show  how  our  method  automatically 


changes  between  various  tracking  regimes  such  as  sinusoidal 
slaloming  or  spiral  orbiting  as  the  target  velocity  changes. 

In  the  future,  we  plan  to  develop  more  accurate  models 
for  target  motion  and  use  them  for  intelligent  UAV  path 
planning.  To  this  end,  we  are  currently  working  on  state 
reduction  methods  to  decrease  the  number  of  discretized 
states  as  well  as  function  approximation  techniques  to  reduce 
the  computation  costs.  Along  with  these  improvements  in  our 
algorithm,  we  plan  to  verify  the  algorithm  using  hardware-in- 
the-loop  flight  simulations  and  to  perform  flight  tests  using 
UAVs  currently  available  at  our  facility. 
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